Genetic Epidemiology
○ Wiley
Preprints posted in the last 90 days, ranked by how well they match Genetic Epidemiology's content profile, based on 14 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit.
Leyden, G. M.; Pagoni, P.; Power, G. M.; Carslake, D.; Richardson, T. G.; Tilling, K.; Hemani, G.; Davey Smith, G.; Sanderson, E.
Show abstract
Genome-wide association studies (GWAS) are conventionally conducted in cohorts spanning a wide age-range. These studies typically assume that genetic associations are constant across different ages. Some traits, however, may have age-varying genetic associations. This has implications for the interpretation of genetic effects derived in downstream applications, such as Mendelian randomization (MR) analyses. In this study we conducted a series of age-stratified GWAS on individuals aged 40-69 years in the UK Biobank, for body-mass index (BMI) and three blood pressure traits (systolic, diastolic and pulsatile pressure (PP)) in 2-year age strata (N up to 26,330). We used a meta-regression approach to systematically identify single nucleotide polymorphisms (SNPs) with evidence for age interaction effects among trait-associated GWAS signals and additional loci genome-wide. Within an MR framework, we examine the relationship between BMI and blood pressure traits on cardiovascular and cardiometabolic outcomes (type-2 diabetes (T2D), stroke, peripheral artery disease (PAD), heart failure, coronary heart disease and atrial fibrillation). Next, we describe the effect of the SNP*Age interaction on those relationships in a modified inverse-variance weighted (ivw) analysis. We identified differential enrichment of age-interaction effects, which was trait dependent. For example, 10.3% of BMI discovery SNPs had evidence for an age-interaction in our data compared to 44.7% for PP (at P<0.05). Our downstream MR and modified ivw analyses highlight the influence of age on the genetically predicted relationship between PP and adverse cardiovascular outcomes. For example, our results indicated that an increased rate of change in genetically predicted PP across the age period is associated with higher susceptibility to PAD (interaction odds ratio= 2.71; P=1.82x10-13; 95%-CI: 2.08-3.53). The data generated in this project provides a valuable resource for further exploration of mechanisms relevant to the genetic architecture of complex traits and all summary data will be made readily accessible to the research community. Author SummaryGenetic variants which reliably predict variation in a trait are a valuable tool within genetic epidemiology studies, offering a means to estimate whether an exposure-outcome relationship is likely to be causal using a method called Mendelian randomization (MR). Typically, MR results are interpreted as the cumulative lifetime effect of the exposure on the outcome. However, there is growing evidence which suggests that the influence of genetic effects on trait variation detected in cross-sectional population studies may be age dependent in some scenarios. In this work we aimed to conduct a thorough investigation on whether and to what extent the influence of genetics on population-level trait variation changes across adulthood. We investigated this question within a methodological framework which used age-stratified summary level data, demonstrating that this approach may have wide applicability to the research community where individual level cohort data are not publicly available. We demonstrate that age interacts with genetic influences across adulthood in a trait dependent manner, where genetics may have a stronger influence on variation in body-mass index measured earlier in life, and on pulsatile pressure later in life. We take advantage of the MR and ivw frameworks to further illustrate how the variation in the exposure explained by genetics varies with increasing age. This exploratory work helps provide insight on the extent that distinct genetic effects are detectable across adulthood, helping us to understand how more precise lifecourse effects may be genetically proxied within an MR setting.
Mukhopadhyay, N.; Feingold, E. E.; Brand, H.; Lee, M. K.; Kurtas, E. N.; Sanchis-Juan, A.; Moreno-Uribe, L.; Wehby, G.; Valencia-Ramirez, L. C.; Restrepo Muneton, C. P.; Padilla, C.; Deleyiannis, F.; Poletta, F. A.; Orioli, I. M.; Hecht, J. T.; Buxo, C. J.; Butali, A.; Adeyemo, W. L.; Abebe, M. E.; Vieira, A. R.; Shaffer, J. R.; Murray, J. C.; Weinberg, S. M.; Ruczinski, I.; Leslie-Clarkson, E. J.; Marazita, M. L.
Show abstract
ObjectiveOur understanding of the genetic causes of non-syndromic orofacial clefts (OFCs) is based largely upon genetic studies of common and rare nucleotide variants. Less is known about the role of copy number variations (CNVs) and the studies published to date have been limited to either small samples or targeted genomic regions. The objective of our study is to investigate the contribution of CNVs spread across the entire genome to OFC risk in a large multi-ancestry cohort. MethodsWe utilized PennCNV on microarray genotyping data to detect CNVs in 10,240 participants (2,484 with clefts, 7,756 unaffected). 70,695 quality-filtered autosomal CNVs (49,660 deletions, 21,035 duplications) were used to assign normal/abnormal copy number statuses at 67,199 positions from the GRCh37 genome assembly. Genome-wide association was run between cleft status and copy number status. ResultsWe observed a highly significant association between OFCs and deletions on chromosome 7p14.1 (p=1.32e-35) driven by Central and South American ancestry (p=1.04e-25) participants, with less significant contributions from European (p=3.37e-08) and Asian (p=0.01) ancestry participants. We also observed four other loci with p-values below 10e-04. ConclusionThe 7p14.1 association observed in our study is a replication of two prior studies in independent cohorts of European ancestry. However, this locus lies in a T-cell receptor region that is subject to somatic rearrangements that decrease in frequency with age and may affect genetic association results. Our data show age effects as well as differences between blood and saliva samples. Thus, our results can be interpreted either as supporting a previously established association with orofacial clefts, or as questioning those previous results in favor of a hypothesis about the behavior of somatic rearrangements in T-cell receptor regions.
Gkatzionis, A.; Davey Smith, G.; Tilling, K.
Show abstract
Mendelian randomization is currently mainly implemented through the use of genetic variants as instrumental variables to investigate the causal effect of an exposure on an outcome of interest. Mendelian randomization studies are robust to confounding bias and reverse causation, but they remain susceptible to selection bias; for example, this can happen if the exposure or outcome are associated with selection into the study sample. Negative controls are sometimes used to detect biases (typically due to confounding) in observational studies. Here, we focus specifically on Mendelian randomization analyses and discuss under what conditions a variable can be used as a negative control outcome to detect selection mechanisms that could bias Mendelian randomization estimates. We show that the main requirement is that the negative control outcome relates to confounders of the exposure and outcome. Counter-intuitively, the effect of the negative control on selection is of secondary concern; for example, a variable that does not affect selection can be a valid negative control for an outcome that does. We also investigate under what conditions age and sex can be used as negative control outcomes in Mendelian randomization analyses. In a real-data application, we investigate the pairwise causal relationships between 19 traits, utilizing data from the UK Biobank. Treating biological sex as a negative control outcome, we identify selection bias in analyses involving commonly used traits such as alcohol consumption, body mass index and educational attainment.
Zhou, A.; Tian, H.; Patel, A.; Mason, A.; Yang, G.; Hypponen, E.; Burgess, S.
Show abstract
The doubly-ranked non-linear Mendelian randomization method can yield biased estimates when instrument strength varies across individuals due to gene-environment (GxE) interactions. We propose a simple strategy to mitigate this bias by modelling GxE interactions and removing the fitted GxE component from the exposure before stratification by the doubly-ranked method. In simulations, the proposed GxE correction strategy eliminated GxE-induced bias with null, linear and non-linear exposure-outcome relationships, and it did not introduce bias even when the effect modifier of the IV-exposure association was a confounder or was correlated with a mediator or collider of the exposure-outcome association. In empirical analyses of serum 25(OH)D, BMI, and LDL-C, falsification tests showed bias in the uncorrected doubly-ranked method. Under the selected panel of effect modifiers, the extent of bias attenuation achieved by GxE correction varied by exposures. GxE correction was most effective for LDL-C, with further support from analyses using negative controls (age at recruitment and sex) and coronary artery disease as a positive control. These findings provide proof of principle evidence that our proposed GxE correction strategy can mitigate GxE-induced bias in practice. Where applicable, we recommend implementing this GxE correction strategy as a sensitivity analysis to assess the robustness of findings from the doubly-ranked method.
Yap, A. J. Y.; Hanson, A. L.; Griffith, G. J.; Sanderson, E.
Show abstract
Medication use is common in large-scale population cohorts, and can modify phenotypic traits of interest. This can potentially bias effect estimates in genome-wide association studies (GWAS) and impact downstream analyses such as mendelian randomization (MR). The best approach to account for medication use in GWAS is unclear. In this study, we compared seven different methods of adjusting for antihypertensive use in a systolic blood pressure (SBP) GWAS of 407,960 White British individuals in the UK Biobank. We found that direct adjustments to measured SBP (adding constants, class-specific constants, censored normal regression) in general yielded a greater number of genome-wide significant variant associations and unmasked stronger GWAS effect estimates than unadjusted measures of SBP. Adjustment for class-specific constants showed the greatest difference relative to unadjusted GWAS. Restriction methods which limit the sample to either untreated individuals or age ranges with low levels of antihypertensive use had less power, due to reduced sample sizes. Effect estimates of treated individuals were deflated relative to untreated individuals, demonstrating the importance of medication adjustment. In MR analyses, we found no substantial differences in inverse-variance weighted (IVW) estimates when using differing exposure GWAS methods in estimation of the effect of SBP on coronary artery disease. Larger variations in IVW estimates were observed for the causal effect of body mass index on SBP across adjustment approaches. In general, the effects of medication use do not substantially affect overall findings. However, bias may arise in MR analyses when the exposure included in the estimation affects the probability of treatment.
Herrick, N.; Goovaerts, S.; Manchel, A.; Lee, M. K.; Zhang, X.; Davies, A.; Carlson, J. C.; Leslie-Clarkson, E. J.; Lewis, S. J.; Marazita, M. L.; Cotney, J.; Claes, P.; Shaffer, J. R.; Weinberg, S. M.
Show abstract
Several lines of evidence suggest that normal-range facial features and nonsyndromic orofacial clefts (OFCs) exhibit a shared genetic basis. Approaches designed to leverage this relationship hold the possibility of revealing new OFC risk loci by boosting discovery power. To test this idea, we applied a pleiotropy-informed GWAS method (cFDR-GWAS) with summary statistics from large, independent European GWASs of normal facial shape (n=4,680; n=3,566) and nonsyndromic cleft lip with or without cleft palate (nsCL/P, n=3,969). The cFDR approach identified 21 independent genomic loci significantly associated with nsCL/P, providing further evidence of the interconnected genetic architecture between these traits. The five original nsCL/P GWAS signals were detected and joined by nine additional loci previously implicated in other OFC association studies. The remaining seven loci represent new nsCL/P genomic regions, and three of these replicated (P < 0.05) in an independent nsCL/P cohort: ASPSCR1, MSX2, and RALYL. A relaxed 10% cFDR-GWAS threshold identified 15 more independent loci with comparable effect sizes to those detected at the strict 5% threshold, two of which replicated: FHOD3 and SMARCA2. Gene expression patterns in major cell types and spatial transcriptomics data highlighted our gene candidates roles in craniofacial development. In conclusion, the application of an empirical Bayesian strategy to draw on association signals from genetically related traits can boost the power to identify and prioritize OFC risk loci missed by agnostic gene mapping approaches. These results hold promise that the cFDR-GWAS approach may be able to enhance our understanding of the genetic architecture of other structural birth defects.
Kianersi, S.; Potts, K. S.; Wang, H.; Sofer, T.; Noordam, R.; Rutter, M. K.; Rexrode, K.; Redline, S.; Huang, T.
Show abstract
IntroductionCircadian misalignment is an emerging risk factor for poor cardiovascular health, and chronotype may reflect underlying circadian processes. While previous conventional observational studies have reported adverse associations between evening chronotype and individual cardiovascular risk factors, Mendelian randomization (MR) may provide further insights into the role of chronotype in overall cardiovascular health, as measured by the American Heart Associations Lifes Essential 8 (LE8; a composite lifestyle and cardiovascular health score ranging from 0 to 100; higher scores indicate better health). MethodsWe conducted both observational cross-sectional and one-sample MR analyses among 317,730 UK Biobank (UKB) participants of White ethnicity. Chronotype was self-reported and modeled on a five-level continuous scale from "definitely evening" to "definitely morning". A polygenic risk score including 341 morning chronotype-associated SNPs from a UKB GWAS served as the MR instrument. Two-stage least-squares regression estimated difference in LE8 per one-unit increment in chronotype towards more morningness, adjusting for age, sex, assessment center, genotyping batch, and 40 genetic principal components. To mitigate potential winners curse bias in UKB due to inflated GWAS estimates, we replicated the analysis in 13,396 White women in the Nurses Health Study II (NHSII). ResultsIn UKB, the multivariable-adjusted difference in LE8 score for each one-unit increment toward more morningness was 0.75-points higher (95% CI: 0.72, 0.78; P<0.001) in observational analysis and a 0.75-points higher (95% CI: 0.55, 0.96; P<0.001) in MR analysis. MR results were similar for men and women (P-heterogeneity = 0.70). In NHSII, while both estimates were positive, increased morningness was associated with higher overall LE8 scores in observational analysis ({beta} = 1.57, 95% CI: 1.40, 1.75; P<0.001), but not in MR analysis ({beta} = 0.89, 95% CI: - 0.67, 2.44; P = 0.26), although the MR association became significant when the score was based only on behavioral components ({beta} = 2.04; 95% CI: 0.43, 3.65; p = 0.0130). Further, morning chronotype was consistently associated with healthy diet across observational and MR analyses in both cohorts. ConclusionsOur findings suggest a modest causal relationship between morning chronotype and better cardiovascular health profiles, particularly diet quality, although replication in other populations remains necessary.
Kim, S.; Goo, T.; Park, T.; Park, M.
Show abstract
Polygenic risk scores (PRSs) quantify an individuals genetic susceptibility to complex traits and diseases. Conventional PRSs, which are based on linear models, perform poorly for phenotypes with skewed distributions or with genetic effects that vary across the distribution. We propose a quantile regression-based PRS (QPRS) that can capture quantile-specific genetic effects. While existing PRSs provide only a single score, QPRS models genetic influences at multiple quantiles of the phenotype, thereby enhancing predictive performance by utilizing these multiple scores as covariates. We evaluate the performance of our method through both simulations and a real-data application. In simulations, QPRS significantly reduces the mean squared error (MSE) compared to the conventional PRS, both in the presence of variance quantitative trait loci and outliers. For real data analysis, we use data from Korea Genome and Epidemiology Study (KoGES) to evaluate predictive performance. We consider two prediction tasks: a continuous outcome (glucose level) and a binary outcome (diabetes status, derived from glucose level). For glucose-level prediction, the model incorporating QPRS achieves a R2 value 4.69 times higher than the model using conventional PRSs. For predicting diabetes status, the model with QPRS produces an area under the curve 1.06 times higher than the model with conventional PRSs.
Gunter, N. D.; Cardenas, A.; Kobor, M.; Gladish, N.; Rehkopf, D.; Dow, W.; Rosero-Bixby, L.; Hubbard, A. E.
Show abstract
Epigenetic clocks estimate biological age from DNA methylation patterns at CpG sites, providing robust predictions of mortality and morbidity risk. "Blue zones"--regions of exceptional longevity--offer a unique opportunity to investigate how biological aging diverges from chronological age. However, standard clocks are typically trained on large, heterogeneous datasets, reflecting average population trends rather than region-specific dynamics. Using data from the Costa Rican Longevity and Healthy Aging Study (CRELES), we profiled DNA methylation from residents of the Nicoya blue zone (n = 206) and a comparison population in other parts of Costa Rica (n = 875). We propose training a SuperLearner, an ensemble machine learning approach, on the non-Nicoyan Costa Ricans to optimize predictive performance across existing clocks and flexible machine learners. Theoretically justified by its Oracle property, SuperLearner performs asymptotically as well as the best candidate predictor in the ensemble, resulting in a weighted combination of algorithms used to predict age. We then used this trained model to construct a calibrated hypothesis test comparing residual age distributions between the blue zone region and the comparison population. Comparing our approach to the five top-performing epigenetic clocks (ranked by MSE) in the Costa Rican cohort, only SuperLearner suggested age deceleration (an average of [~] 1 year) in the non-Nicoyan reference group. Before calibration, SuperLearner showed the strongest evidence for slowed biological aging among blue zone Nicoyans, estimating a three-year reduction [Formula] in epigenetic age. Calibrating with non-Nicoyan Costa Ricans improved consistency between estimates in all clocks, decreasing the estimated aging advantage in Nicoyans to about two years [Formula]. This approach provides a robust framework for estimating longevity in distinct regions when a relevant comparison population is available.
Li, Y.; Cornejo-Sanchez, D. M.; Dong, R.; Naderi, E.; Wang, G. T.; Leal, S. M.; DeWan, A. T.
Show abstract
The genetic relationship between asthma and lung function may be dependent on age-at-onset (AAO) of asthma. We investigated whether the shared genetics between asthma AAO and lung function is dependent on AAO. Asthma cases from UK Biobank were subset according to their AAO and genetic correlation was used to obtain genetically homogeneous groups, i.e., [≤]20 (LT20), 20-40, and >40 (GT40) years. Association analysis and fine-mapping were performed to identify shared genetics between AAO groups and lung function. Mediation and quantitative trait locus (QTL) analyses were performed to identify mechanisms underlying shared genetic associations. Chr5, chr6, chr12, and chr17 each had one region that displayed a cross-phenotype replicated association with at least one AAO group and lung function. Overlapping credible sets obtained from fine-mapping were observed on chr5 and chr6. Mediation analyses demonstrated that for each region the proportion mediated through asthma on lung function was larger for asthma LT20 compared to 20-40 and GT40 suggesting that their effects on lung function were more strongly driven by this association. Tissue-specific QTL analysis revealed shared etiology on chr5 may be acting through SLC22A5 and C5orf56 which might play an important role in decreased lung function among individuals with earlier-onset asthma.
Luo, D.; Lussier, A. A.
Show abstract
Prenatal alcohol exposure (PAE) can lead to a range of deficits falling under the umbrella of Fetal Alcohol Spectrum Disorder (FASD), which included higher risk for adverse neurodevelopmental and mental health outcomes. Although the biological mechanisms underlying the link between PAE and mental health remain unclear, DNA methylation (DNAm), an epigenetic modification responsive to environmental exposures, may explain these relationships. Here, we applied a two-sample Mendelian randomization (MR) framework to assess whether DNAm loci previously associated with PAE or FASD are linked to 11 psychiatric outcomes. Using summary statistics from the Genetics of DNA Methylation Consortium (GoDMC) mQTL database and large-scale GWAS, we analyzed DNAm loci from two epigenome-wide association studies: one examining FASD by Lussier et al. (2018) and one examining PAE patterns by Sharp et al. (2018). A total of 106 associations (Lussier) and 28 associations (Sharp) reached nominal significance (p<0.05) and passed sensitivity tests, with several surviving multiple testing correction. Notably, schizophrenia and bipolar disorder had the highest number of associated loci across both studies. Functional analysis showed that DNAm loci were enriched in signaling pathways, embryonic development, and neuron differentiation. Regional enrichment analysis revealed that FASD-related loci were more likely to occur in enhancer and south shore, implicating distal regulatory elements. PAE patterns conferred heterogeneous effects on DNAm and mental health risk, underscoring the complexity of timing-specific epigenetic vulnerability. These findings offer novel insights into the potential mechanism of DNAm linking PAE to mental health, and demonstrate the utility of MR in epigenetic epidemiology.
Han, J.; Deng, K.; Hong, Z.; Zhang, Z.; Godneva, N.; de Mutsert, R.; van Hylckama Vlieg, A.; Rosendaal, F. R.; Mook-Kanamori, D. O.; Zheng, J.-S.; Chen, Y.; Segal, E.; Li-Gao, R.; DIYUFOOD consortium,
Show abstract
Background and ObjectivesRecent large-scale studies have consistently linked healthy dietary patterns to improved cardiometabolic health; however, the underlying biological pathways remain largely unclear, especially in non-European populations. In this study, we leverage data from four population-based cohorts (UK Biobank, NEO study, GNHS, and 10K) to investigate both common and cohort-specific biological pathways linking healthy dietary patterns to cardiometabolic disease through multi-omics profiling. Material and methodsIn each cohort, we first assessed the associations between each of the five major dietary pattern scores (i.e., AMED, hPDI, DII, AHEI, and EDIH) and cardiometabolic disease risk using Cox or logistic regression models. To explore the potential mediating role, metabolomics and proteomics measurements were incorporated into the models. All models were adjusted for relevant confounders, and false discovery rate correction was applied to account for multiple testing. ResultsWith a total of 71,679 individuals without pre-existing cardiometabolic disease across four participating cohorts (UKB: 54,024, NEO: 4,838, GNHS: 3,201, and 10K: 9,616), we confirmed that adherence to healthy dietary patterns was associated with a 5-10% reduced risk of cardiometabolic disease. Three common biological pathways were identified: (1) mediation via large HDL particles and apolipoprotein F; (2) mediation via DNAJ/Hsp40 and triglyceride-rich lipoproteins; and (3) mediation via CRHBP-regulated HPA axis activity affecting triglyceride-rich lipoproteins. ConclusionsOur integrative multi-omics analysis across diverse populations identifies novel biomarkers that connect healthy dietary patterns with cardiometabolic risk. These findings deepen our understanding of the biological mechanisms underlying diet-related disease and hold promise for enhancing the development of precision nutrition interventions.
Radosavljevic, L.; Smith, S.; Nichols, T. E.
Show abstract
The UK Biobank (UKB) Brain Imaging cohort contains data from almost 100,000 subjects and has yielded invaluable understanding of the links between the brain and health outcomes and lifestyles. Much of the understanding of these links has come from exploring the association between Imaging Derived Phenotypes (IDPs) and other variables that are unrelated to brain imaging, so called non-Imaging Derived Phenotypes (nIDPs). When performing analysis of this kind, it is very important to control for well known confounding factors such as age, sex and socio-economic status, as well as confounds which are related to the imaging protocol itself. In previous work, we created a pipeline for constructing imaging confounds for use in statistical inference via a standard multivariate linear regression approach (Alfaro-Almagro et. al. 2021). However, this approach is problematic when the number of confounds exceeds the number of subjects, and is severely underpowered when the number of number of subjects is not much larger than the number of confounds. In this work, we perform a simulation study to evaluate 13 modelling approaches to account for confounds when their number is similar to or exceeds the number of subjects. Based on the simulation results, we recommend a ridge regression based permutation test for low sample sizes (n [≤] 50), a version of de-sparsified LASSO for intermediate sample sizes (50 < n [≤] 500), and multivariate linear regression aided by Principal Component Analysis (PCA) for larger sample sizes (n > 500). We also demonstrate the use of our recommended methodology on a real data example of finding associations between Alzheimers Disease (AD) and IDPs.
Opperbeck, A.; Wang, Z.; Rautiainen, I.; Heikkinen, A.; Kaprio, J.; Ollikainen, M.; Sebert, S.; Sillanpaa, E.
Show abstract
Biological ageing begins before birth, with early-life exposures shaping late-life health. These exposures drive health inequities early, yet specific exposures and the composition of the ageing exposome remain largely undefined. This gap may persist as the field lacks agnostic investigations accounting for non-linearity, interactions and subtle signals. We aimed to identify exposures predictive of epigenetic ageing accumulated during childhood and adolescence and explore the composition of the "missing" exposome. In the FinnTwin12 cohort (847 participants measured at ages 12, 14, 17, and 22), over 500 exposures (including lifestyle, green environments, air pollutants, and demographic factors) were analysed using exposome-wide association studies and data-driven ML models (Knockoff Boosted Tree, sNPLS and Boruta). Epigenetic age (blood DNA methylation at age 22) was estimated using GrimAge and DunedinPACE. Our exposure set explains [~]28% of the variance in epigenetic age (R2 GrimAge = 25.7%; R2 DunedinPACE = 30.8%). Predictors of increased epigenetic age included lifestyle and socioeconomic factors (smoking, alcohol use, youth unemployment), alongside green space, while tree cover, vegetation index, neighbourhood age structure and aerial black carbon emerged as predictors of decreased epigenetic age. Twin modelling revealed that unexplained variance - the missing exposome - consists primarily of environmental factors unshared by twin siblings, distinct from the substantial genetic component captured by our model. Our results underscore the need to expand the exposome approach and model non-linearities to reveal subtle environmental signals accumulating early in life. Because identified predictors include modifiable systemic factors, they offer opportunities to alter health trajectories and mitigate inequity early on.
Rodriguez-Girondo, M.; Berg, N. v. d.; Hof, M. H.
Show abstract
Defining and quantifying exceptional familial human survival is a persistent challenge in longevity research. Traditional approaches rely on binary thresholds, arbitrary cutoffs, or simple descriptive measures, which discard information on variation among the oldest individuals, ignore differences in background mortality, and yield unstable family-level summaries. We propose a principled, model-based framework that transforms survival times into percentiles relative to population life tables, standardizing across birth cohorts, sexes, and populations. We extend beta mixed-effects regression to accommodate intentional left-censoring, which downweights early deaths while retaining their contribution to the likelihood, thereby focusing inference on extreme survival. Family-specific random effects provide interpretable, statistically grounded longevity scores, overcoming the limitations of ad hoc measures and enabling robust identification of long-lived families. Simulation studies and application to a large multigenerational Dutch cohort demonstrate that the method reliably identifies families enriched for longevity. This framework provides a flexible, interpretable, and robust tool for analyzing familial survival, offering a paradigm shift in the statistical study of exceptional human lifespan.
Rossen, J.; Strober, B. J.; Hou, K.; Kerner, G.; Price, A. L.
Show abstract
Understanding genetic architectures of disease is fundamental to partitioning heritability, polygenic risk prediction, and statistical fine-mapping. Genetic architectures of disease in European populations have been shown to depend on European minor allele frequency (MAF): SNPs with lower MAF have larger per-allele effects, due to the action of negative selection. However, we hypothesized that African MAF (defined using African-ancestry segments in African Americans), which is not distorted by the out-of-Africa bottleneck, might better predict per-allele effect sizes of common genetic variation in European populations; we note that common variants explaining most disease heritability are typically much older than the split between African and non-African populations. To demonstrate this, we first analyze the proportion of non-synonymous SNPs, which are strongly impacted by negative selection. The proportion of non-synonymous SNPs is much better predicted by African MAF than European MAF; a mixture of African MAF with weight w=0.95 (95% CI: (0.93, 0.96)) and European MAF with weight (1-w) is a more powerful predictor than either European MAF (P<10-15, 3.65x greater increase in log-likelihood relative to a null model without MAF dependence) or African MAF (P<10-15). Next, we consider the widely used model, in which per-allele GWAS effect size variance is proportional to [(1 - )], where pE is the European MAF. We propose a different model in which per-allele effect size variance is proportional to [(1 - )], where pmix=w*pA+(1-w)*pE, and pA is the African MAF. We fit the mix model by extending the baseline-LD model used in S-LDSC to include a grid of bivariate African and European MAF bins and identifying values of w and mix that best fit mean effect size variance estimates from S-LDSC across bivariate MAF bins. We demonstrate that our approach provides conservative estimates of w in simulations. We applied this approach to summary statistics for 50 diseases/complex traits in European populations (average N=483K) and estimated best-fit parameters of w=0.96 (95% CI: (0.76, 1.16)) and mix=-0.34 (95% CI: (-0.67, -0.02)), attaining a far better fit than the standard model using pE only (P<10-15, 4.53x greater decrease in mean-squared error relative to a null model without MAF dependence). We conclude that per-allele disease and complex trait effect sizes are predominantly African MAF-dependent in European populations.
Hwang, L.-D.; Lin, C.; Evans, D. M.; Martin, N. G.; Reed, D. R.; Joseph, P. V.
Show abstract
BackgroundMendelian randomization (MR) is increasingly used for causal inference in nutritional epidemiology; however, dietary MR studies often rely on instruments statistically selected from genome-wide association studies of self-reported intake, which are vulnerable to pleiotropy and reverse causation and may violate core MR assumptions. We aimed to develop and evaluate a biologically informed framework for selecting valid genetic instruments for dietary exposures, based on genes encoding taste and olfactory receptors that mediate chemosensory inputs and shape food preferences and dietary behaviour. MethodsWe prioritised 1,214 nonsynonymous variants in 30 taste and 295 olfactory receptor genes with minor allele frequency [≥]1%. Associations with 140 food-liking traits were tested in UK Biobank participants aged 37 to 73 years. Candidate variants were evaluated using a multi-stage filtering pipeline designed to improve instrument validity. This included replication in an independent younger cohort (Avon Longitudinal Study of Parents and Children, age 25), concordance between food liking and intake, exclusion of associations with socioeconomic status, assessment of food specificity accounting for linkage disequilibrium and co-consumption patterns, and directionality testing to reduce reverse causation. Retained variants were applied as instruments in MR analyses to assess cardiometabolic outcomes. ResultsWe identified 268 nonsynonymous variants within 101 olfactory and 16 taste receptor genes associated with 96 food-liking traits. The filtering process yielded 28 candidate instruments for 24 foods. Among these, the instrument for onion liking uniquely satisfied all criteria for classification as high confidence. To demonstrate clinical relevance, genetically proxied onion liking was associated with lower blood pressure and a reduced risk of type 2 diabetes in MR analyses, with no evidence of effects on body mass index, glycaemic traits, or serum lipid levels. ConclusionsGuiding genetic instrument selection using chemosensory receptor genes provides a biologically informed strategy for dietary Mendelian randomization that reduces susceptibility to pleiotropy and reverse causation. This framework enables more robust causal evaluation of diet-disease relationships and strengthens inference in nutritional epidemiology and public health research.
Herold, J. M.; Wiegrebe, S.; Thorand, B.; Winkler, T. W.; Gieger, C.; Hartig, F.; Behr, M.; Peters, A.; Kuechenhoff, H.; Heid, I. M.
Show abstract
Polygenic scores (PGSs) are widely used to summarize the joint genetic effects for disease-related traits. However, while age-dependent genetic effects are increasingly recognized, their integration into PGSs remains underexplored. Kidney function, assessed by estimated glomerular filtration rate (eGFR), has strong age-related genetic effects, and prediction of kidney function decline is an unmet need. We develop an age-informative PGS for quantitative traits by generating age-specific weights via main and interaction effects and compare its performance to the age-agnostic PGS in theory and real data of eGFR. We test PGSs across 282 kidney function SNPs in cross-sectional and longitudinal data from UK Biobank (n=348,275, m=1,520,382) and independent population-based individuals aged 25 to 98 years (KORA&AugUR; n=9,057, m=16,804). In theory and real data, we illustrate that ignoring age mis-specifies genetic effects. The age-informative PGS has better performance than the age-agnostic PGS in young and old individuals (KORA&AugUR: 6.3% versus 5.9% of eGFR variance in 25-to 45-year-old, 2.3% versus 1.8% in 75-to 98-year-old). The PGS based on interaction effects explains more of the eGFR-decline variability than the age-agnostic PGS. The highest versus lowest PGS quintile predict eGFR-decline of -0.88 (95%-CI=[-0.93;-0.83]) versus -0.75 ml/min/1.73m2 (95%-CI=[-0.79;-0.71]) on the population-level, similar to strata by acquired risks like diabetes, obesity or albuminuria. Prediction of eGFR-decline on the individual level remains challenging by both, genetic or acquired risks. Overall, we provide a simple approach to an age-informative PGS for quantitative disease traits and illustrate its chances and challenges for predicting kidney function and kidney function decline.
Otsuka-Yamasaki, Y.; Sutoh, Y.; Hachiya, T.; Nakao, M.; Minabe, S.; Komaki, S.; Ohmomo, H.; Sasaki, M.; Shimizu, A.
Show abstract
Osteoporosis and fractures are major health concerns. We developed and validated a polygenic score (PGS) for osteoporosis in a Japanese population using heel quantitative ultrasound-derived T-scores. Genome-wide association study data from 12,371 participants in the Tohoku Medical Megabank Community-Based Cohort identified genome-wide significant loci, including MBL2, TMEM135, and WNT16. PGS models were constructed and evaluated using independent datasets for model selection (n = 1,419) and validation (n = 8,711). Adding the PGS to age and sex yielded modest improvements in discrimination but supported genetic risk stratification. Compared with the intermediate group, the lowest PGS quintile (bottom 20%, genetically high-risk) had higher odds of osteoporosis (OR = 1.22, 95% confidence interval (CI): 1.07-1.40), whereas the highest PGS quintile (top 20%, genetically low-risk) had lower odds of osteoporosis (OR = 0.85, 95% CI: 0.71-0.98). Prospective follow-up (mean 3.4 years) showed a similar gradient for incident osteoporosis, with higher incidence rate ratios in the high-risk group (1.42, 95% CI: 1.17-1.73) and lower incidence rate ratios in the low-risk group (0.70, 95% CI: 0.54-0.89). Age-stratified analyses revealed no significant age-PGS interaction and no differences in the slope of age-related T-score declines across genetic risk groups. Observed T-scores in young adults (20-44 years) and extrapolation to age 20 suggested lower bone status around peak bone mass among genetically high-risk individuals. These findings indicate that a Japanese-specific PGS can stratify osteoporosis risk and may help identify individuals at elevated genetic risk earlier in adulthood.
Wu, S.; Hou, L.; Yuan, Z.; Sun, X.; Yu, Y.; Chen, H.; Huang, L.; Li, H.; Xue, F.
Show abstract
The integration of causal effect estimates from multiple Mendelian Randomization studies has become increasingly popular. However, the presence of overlapping databases compromises traditional meta-analysis, leading to inflated variance and reduced statistical power. Here, we propose JointMR, a joint likelihood-based approach designed to integrate multiple GWAS summary databases while explicitly accounting for the covariance matrix of the Wald ratio estimates. Specifically, to accommodate potential cross-study heterogeneity, JointMR incorporates both fixed-effect and random-effects models. Simulations demonstrated that JointMR provides unbiased estimates with higher statistical power and superior Type I error control compared to conventional meta-analysis methods of standard MR estimates (e.g., IVW), especially as database correlation increases. In a real-data application examining total cholesterol, HDL-C, LDL-C and triglycerides on type 2 diabetes, JointMR resolved contradictions seen in standard approaches, generating stable and biologically plausible estimates. In conclusion, JointMR overcomes critical limitations of existing methods, offering a more powerful and reliable tool for robust causal inference from the growing repository of GWAS summary statistics.